Context
Imagine driving through town and a coupon is delivered to your cell phone for a restaraunt near where you are driving. Would you accept that coupon and take a short detour to the restaraunt? Would you accept the coupon but use it on a sunbsequent trip? Would you ignore the coupon entirely? What if the coupon was for a bar instead of a restaraunt? What about a coffee house? Would you accept a bar coupon with a minor passenger in the car? What about if it was just you and your partner in the car? Would weather impact the rate of acceptance? What about the time of day?
Obviously, proximity to the business is a factor on whether the coupon is delivered to the driver or not, but what are the factors that determine whether a driver accepts the coupon once it is delivered to them? How would you determine whether a driver is likely to accept a coupon?
Overview
The goal of this project is to use what you know about visualizations and probability distributions to distinguish between customers who accepted a driving coupon versus those that did not.
Data
This data comes to us from the UCI Machine Learning repository and was collected via a survey on Amazon Mechanical Turk. The survey describes different driving scenarios including the destination, current time, weather, passenger, etc., and then ask the person whether he will accept the coupon if he is the driver. Answers that the user will drive there ‘right away’ or ‘later before the coupon expires’ are labeled as ‘Y = 1’ and answers ‘no, I do not want the coupon’ are labeled as ‘Y = 0’. There are five different types of coupons -- less expensive restaurants (under $20), coffee houses, carry out & take away, bar, and more expensive restaurants (\$20 - $50).
Deliverables
Your final product should be a brief report that highlights the differences between customers who did and did not accept the coupons. To explore the data you will utilize your knowledge of plotting, statistical summaries, and visualization using Python. You will publish your findings in a public facing github repository as your first portfolio piece.
Keep in mind that these values mentioned below are average values.
The attributes of this data set include:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
1.Read in the coupons.csv file into a data frame.
df = pd.read_csv('data/coupons.csv')
df.head()
| destination | passanger | weather | temperature | time | coupon | expiration | gender | age | maritalStatus | ... | CoffeeHouse | CarryAway | RestaurantLessThan20 | Restaurant20To50 | toCoupon_GEQ5min | toCoupon_GEQ15min | toCoupon_GEQ25min | direction_same | direction_opp | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | No Urgent Place | Alone | Sunny | 55 | 2PM | Restaurant(<20) | 1d | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 1 |
| 1 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 1 | 0 |
| 2 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Carry out & Take away | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 1 |
| 3 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 2h | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
| 4 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Female | 21 | Unmarried partner | ... | never | NaN | 4~8 | 1~3 | 1 | 1 | 0 | 0 | 1 | 0 |
5 rows × 26 columns
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 12684 entries, 0 to 12683 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 destination 12684 non-null object 1 passanger 12684 non-null object 2 weather 12684 non-null object 3 temperature 12684 non-null int64 4 time 12684 non-null object 5 coupon 12684 non-null object 6 expiration 12684 non-null object 7 gender 12684 non-null object 8 age 12684 non-null object 9 maritalStatus 12684 non-null object 10 has_children 12684 non-null int64 11 education 12684 non-null object 12 occupation 12684 non-null object 13 income 12684 non-null object 14 car 108 non-null object 15 Bar 12577 non-null object 16 CoffeeHouse 12467 non-null object 17 CarryAway 12533 non-null object 18 RestaurantLessThan20 12554 non-null object 19 Restaurant20To50 12495 non-null object 20 toCoupon_GEQ5min 12684 non-null int64 21 toCoupon_GEQ15min 12684 non-null int64 22 toCoupon_GEQ25min 12684 non-null int64 23 direction_same 12684 non-null int64 24 direction_opp 12684 non-null int64 25 Y 12684 non-null int64 dtypes: int64(8), object(18) memory usage: 2.5+ MB
There is a typo in passenger column name, lets fix that
df.rename(columns={"passanger": "passenger"}, inplace=True)
There are a few columns with name in camel case or starting with upper case. Lets change those to lowercase and with and underscore whereever required so that the column names are more consistent and easier to work with.
df = df.rename(columns={"maritalStatus": "marital_status",
"Bar" : "bar",
"CoffeeHouse": "coffee_house",
"CarryAway":"takeaway",
"RestaurantLessThan20" : "restaurant_lt_20" ,
"Restaurant20To50" : "restaurant_gt_20",
"Y" : "coupon_accepted"})
1.Investigate the dataset for missing or problematic data.
#Find columns with null values and count of values missing
empty_columns = df.columns[df.isna().sum() > 0]
df[empty_columns].isnull().sum()
car 12576 bar 107 coffee_house 217 takeaway 151 restaurant_lt_20 130 restaurant_gt_20 189 dtype: int64
#Explore columns with missing data
print(df['car'].value_counts())
print(df['bar'].value_counts())
print(df['coffee_house'].value_counts())
print(df['takeaway'].value_counts())
print(df['restaurant_lt_20'].value_counts())
print(df['restaurant_gt_20'].value_counts())
Scooter and motorcycle 22 Mazda5 22 do not drive 22 crossover 21 Car that is too old to install Onstar :D 21 Name: car, dtype: int64 never 5197 less1 3482 1~3 2473 4~8 1076 gt8 349 Name: bar, dtype: int64 less1 3385 1~3 3225 never 2962 4~8 1784 gt8 1111 Name: coffee_house, dtype: int64 1~3 4672 4~8 4258 less1 1856 gt8 1594 never 153 Name: takeaway, dtype: int64 1~3 5376 4~8 3580 less1 2093 gt8 1285 never 220 Name: restaurant_lt_20, dtype: int64 less1 6077 1~3 3290 never 2136 4~8 728 gt8 264 Name: restaurant_gt_20, dtype: int64
#the values in car do not seem to be of any use for this analysis. Drop the column
df = df.drop(columns=['car'])
#Check the dataframe when bar is null
df[pd.isnull(df['bar'])].iloc[0:,0:20].head()
| destination | passenger | weather | temperature | time | coupon | expiration | gender | age | marital_status | has_children | education | occupation | income | bar | coffee_house | takeaway | restaurant_lt_20 | restaurant_gt_20 | toCoupon_GEQ5min | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 517 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 50plus | Divorced | 1 | Some college - no degree | Management | 74999 | NaN | NaN | NaN | NaN | NaN | 1 |
| 518 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Male | 50plus | Divorced | 1 | Some college - no degree | Management | 74999 | NaN | NaN | NaN | NaN | NaN | 1 |
| 519 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Restaurant(<20) | 1d | Male | 50plus | Divorced | 1 | Some college - no degree | Management | 74999 | NaN | NaN | NaN | NaN | NaN | 1 |
| 520 | No Urgent Place | Friend(s) | Sunny | 80 | 6PM | Coffee House | 2h | Male | 50plus | Divorced | 1 | Some college - no degree | Management | 74999 | NaN | NaN | NaN | NaN | NaN | 1 |
| 521 | No Urgent Place | Friend(s) | Sunny | 55 | 2PM | Coffee House | 2h | Male | 50plus | Divorced | 1 | Some college - no degree | Management | 74999 | NaN | NaN | NaN | NaN | NaN | 1 |
Since bar column signifies the number of times a customer goes to a bar, we do not have a way to accurately fill in the missing data. Assuming the missing data as never can skew the data unfavorly. So we drop the rows where this data is missing. The same logic applies to other columns as well i.e. for coffee_house, takeaway ,restaurant_lt_20 ,restaurant_gt_20
df = df.dropna(subset=['bar','coffee_house','takeaway','restaurant_lt_20','restaurant_gt_20'])
#Verify that there are no more missing values in the dataframe
#Print all columns with nan values along with the count
print(df[df.columns[df.isna().sum() > 0]].isnull().sum())
#Check how many records were dropped in total
print('Dropped row count : ' + str(12684 - df['destination'].size))
Series([], dtype: float64) Dropped row count : 605
df['income'].value_counts()
$25000 - $37499 1919 $12500 - $24999 1728 $100000 or More 1692 $37500 - $49999 1689 $50000 - $62499 1565 Less than $12500 1014 $62500 - $74999 840 $87500 - $99999 818 $75000 - $87499 814 Name: income, dtype: int64
Lets make the ranges more uniform
df['income'] = df['income'].str.replace("$", "", regex=False)
df['income'].replace(to_replace='Less than 12500',value='0 - 12500',inplace=True)
df['income'].replace(to_replace='100000 or More',value='100000 - 200000',inplace=True)
df['income'].value_counts()
25000 - 37499 1919 12500 - 24999 1728 100000 - 200000 1692 37500 - 49999 1689 50000 - 62499 1565 0 - 12500 1014 62500 - 74999 840 87500 - 99999 818 75000 - 87499 814 Name: income, dtype: int64
df['education'].value_counts()
Some college - no degree 4219 Bachelors degree 4021 Graduate degree (Masters or Doctorate) 1808 Associates degree 1066 High School Graduate 877 Some High School 88 Name: education, dtype: int64
Some of these values can be merged into singular value:
df['education'].replace(to_replace='Some college - no degree',value='High School Graduate',inplace=True)
df['education'].replace(to_replace='Associates degree',value='Bachelors degree',inplace=True)
df['education'].replace(to_replace='Some High School',value='High School Graduate',inplace=True)
df['education'].value_counts()
High School Graduate 5184 Bachelors degree 5087 Graduate degree (Masters or Doctorate) 1808 Name: education, dtype: int64
print(df['age'].value_counts())
"""
Lets change the strings below21 and 50plus to numbers so that we can treat age as a numeric value. For simplicity,
I am chaging below 21 to 16 and 50plus to 51
"""
df['age'].replace(to_replace='below21',value=16,inplace=True)
df['age'].replace(to_replace='50plus',value=51,inplace=True)
df['age'] = df['age'].astype(int)
df.info()
21 2537 26 2399 31 1925 50plus 1732 36 1253 41 1065 46 664 below21 504 Name: age, dtype: int64 <class 'pandas.core.frame.DataFrame'> Int64Index: 12079 entries, 22 to 12683 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 destination 12079 non-null object 1 passenger 12079 non-null object 2 weather 12079 non-null object 3 temperature 12079 non-null int64 4 time 12079 non-null object 5 coupon 12079 non-null object 6 expiration 12079 non-null object 7 gender 12079 non-null object 8 age 12079 non-null int64 9 marital_status 12079 non-null object 10 has_children 12079 non-null int64 11 education 12079 non-null object 12 occupation 12079 non-null object 13 income 12079 non-null object 14 bar 12079 non-null object 15 coffee_house 12079 non-null object 16 takeaway 12079 non-null object 17 restaurant_lt_20 12079 non-null object 18 restaurant_gt_20 12079 non-null object 19 toCoupon_GEQ5min 12079 non-null int64 20 toCoupon_GEQ15min 12079 non-null int64 21 toCoupon_GEQ25min 12079 non-null int64 22 direction_same 12079 non-null int64 23 direction_opp 12079 non-null int64 24 coupon_accepted 12079 non-null int64 dtypes: int64(9), object(16) memory usage: 2.4+ MB
Lets explore direction columns
print(df['direction_opp'].value_counts())
print(df['direction_same'].value_counts())
#The values in the two column seems to be directly related to each other
print(df.query('direction_opp == 1')['direction_same'].value_counts())
df.query('direction_same == 1')['direction_opp'].value_counts()
#Lets club the two columns into one - direction.
#A value of 0 represents going in the same direction whereas 1 will represnt opposite direction
df['direction'] = np.where(df['direction_same'] == 1, 0 , 1)
#verify the value in the new column
df
#Drop the two redundant columns
df = df.drop(columns=['direction_same', 'direction_opp'])
df
1 9480 0 2599 Name: direction_opp, dtype: int64 0 9480 1 2599 Name: direction_same, dtype: int64 0 9480 Name: direction_same, dtype: int64
| destination | passenger | weather | temperature | time | coupon | expiration | gender | age | marital_status | ... | bar | coffee_house | takeaway | restaurant_lt_20 | restaurant_gt_20 | toCoupon_GEQ5min | toCoupon_GEQ15min | toCoupon_GEQ25min | coupon_accepted | direction | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22 | No Urgent Place | Alone | Sunny | 55 | 2PM | Restaurant(<20) | 1d | Male | 21 | Single | ... | never | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 1 | 1 |
| 23 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Male | 21 | Single | ... | never | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 0 | 1 |
| 24 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 21 | Single | ... | never | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 1 | 1 |
| 25 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Carry out & Take away | 2h | Male | 21 | Single | ... | never | less1 | 4~8 | 4~8 | less1 | 1 | 1 | 0 | 0 | 1 |
| 26 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Male | 21 | Single | ... | never | less1 | 4~8 | 4~8 | less1 | 1 | 0 | 0 | 0 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12679 | Home | Partner | Rainy | 55 | 6PM | Carry out & Take away | 1d | Male | 26 | Single | ... | never | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 1 | 0 |
| 12680 | Work | Alone | Rainy | 55 | 7AM | Carry out & Take away | 1d | Male | 26 | Single | ... | never | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 1 | 1 |
| 12681 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 26 | Single | ... | never | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 0 |
| 12682 | Work | Alone | Snowy | 30 | 7AM | Bar | 1d | Male | 26 | Single | ... | never | never | 1~3 | 4~8 | 1~3 | 1 | 1 | 1 | 0 | 1 |
| 12683 | Work | Alone | Sunny | 80 | 7AM | Restaurant(20-50) | 2h | Male | 26 | Single | ... | never | never | 1~3 | 4~8 | 1~3 | 1 | 0 | 0 | 0 | 0 |
12079 rows × 24 columns
The coupon_accepted column has an int value of 0 and 1. lets add another column with yes/no so that its easier to read in charts
df['coupon_accepted_str'] = np.where(df['coupon_accepted'] == 0, 'No', 'Yes')
For the data pertaining the number of times a driver visit various establishments (1.e. bar, coffee shop etc.) , the values are a string defining range of days. Lets replace these values with a numeric value so that it is easier to analyze. Lets replace the values with the lower bound of the range i.e. the minimum number of days a driver visits the establishment per month
df.replace(to_replace='less1',value='Never', inplace=True)
df.replace(to_replace='never',value='Never', inplace=True)
df.replace(to_replace='gt8',value='8 or more', inplace=True)
For the columns toCoupon_GEQ5min, toCoupon_GEQ15min, toCoupon_GEQ25min, I want to analyze if we can combine these to a single column. These columns denote how far is the driver from the coupon place.
toCoupon_GEQ5min = df['toCoupon_GEQ5min'].value_counts()
print(f'toCoupon_GEQ5min \n {toCoupon_GEQ5min} \n')
toCoupon_GEQ15min = df['toCoupon_GEQ15min'].value_counts()
print(f'toCoupon_GEQ15min \n {toCoupon_GEQ15min} \n')
toCoupon_GEQ25min = df['toCoupon_GEQ25min'].value_counts()
print(f'toCoupon_GEQ25min \n {toCoupon_GEQ25min} \n')
print(df.shape)
toCoupon_GEQ5min 1 12079 Name: toCoupon_GEQ5min, dtype: int64 toCoupon_GEQ15min 1 6779 0 5300 Name: toCoupon_GEQ15min, dtype: int64 toCoupon_GEQ25min 0 10637 1 1442 Name: toCoupon_GEQ25min, dtype: int64 (12079, 25)
All the rows have 1 for col toCoupon_GEQ5min. This means that all the locations are more than or equal to 5 mins away. 6779 rows are 15 mins away and 1442 are 25 mins away. How do we get the count of locations 5 mins away?
Out of 6779, 1442 places are 25 mins or more away. This mean that 6779 - 1442 = 5337 places are between 15 and 25 mins away And 1442 places are more than 25 mins away
Lets add a new column replacing these three that lists the distance. For simplicity, we will take the distance as 10 mins (for places 5-15 mins away), 20 for places 15-25 mins away and 30 for places more than 25 mins away
def get_distance(row):
toCoupon_GEQ5min = row['toCoupon_GEQ5min']
toCoupon_GEQ15min = row['toCoupon_GEQ15min']
toCoupon_GEQ25min = row['toCoupon_GEQ25min']
if (toCoupon_GEQ5min == 1) & (toCoupon_GEQ15min == 0):
return 10
if (toCoupon_GEQ15min == 1) & (toCoupon_GEQ25min == 1):
return 30
else:
return 20
df['driving_time'] = df.apply(get_distance , axis = 1)
df['driving_time'].value_counts()
#drop redundant columns
df = df.drop(columns=['toCoupon_GEQ5min','toCoupon_GEQ15min' ,'toCoupon_GEQ25min'])
Lets look at the final data set
df
| destination | passenger | weather | temperature | time | coupon | expiration | gender | age | marital_status | ... | income | bar | coffee_house | takeaway | restaurant_lt_20 | restaurant_gt_20 | coupon_accepted | direction | coupon_accepted_str | driving_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22 | No Urgent Place | Alone | Sunny | 55 | 2PM | Restaurant(<20) | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 1 | 1 | Yes | 10 |
| 23 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 10 |
| 24 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 1 | 1 | Yes | 10 |
| 25 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Carry out & Take away | 2h | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 20 |
| 26 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12679 | Home | Partner | Rainy | 55 | 6PM | Carry out & Take away | 1d | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 1 | 0 | Yes | 10 |
| 12680 | Work | Alone | Rainy | 55 | 7AM | Carry out & Take away | 1d | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 1 | 1 | Yes | 10 |
| 12681 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 0 | No | 10 |
| 12682 | Work | Alone | Snowy | 30 | 7AM | Bar | 1d | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 1 | No | 30 |
| 12683 | Work | Alone | Sunny | 80 | 7AM | Restaurant(20-50) | 2h | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 0 | No | 10 |
12079 rows × 23 columns
Lets explore the data set and see what the data looks like by vizualizing a few attributes
age distribution of the customers#Plot a histogram for age
fig = px.histogram(df, x='age', title='Age Distribution', labels={'age':'Age'})
fig.update_layout( yaxis_title="Count")
fig.show()
Income Distribution of customers#Income Distribution
fig = px.histogram(df['income'].astype('str'), x='income', title='Income Distribution').update_xaxes(categoryorder='total ascending')
fig.update_layout( xaxis_title="Income Range in $" ,yaxis_title="Count")
fig.update_xaxes(tickangle = 45,title_standoff = 100)
fig.show()
accept the coupon? coupons_accepted = df.query('coupon_accepted_str == "Yes"')['coupon_accepted_str'].count()
total_coupons = df['coupon_accepted_str'].count()
coupons_accepted_pct = round(coupons_accepted/total_coupons*100,2)
print('Proportion to accept coupon : ' + str(coupons_accepted_pct) + '%')
fig = px.histogram(df['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True, title='Coupon Acceptance')
fig.update_layout( yaxis_title="Percentage")
fig.show()
Proportion to accept coupon : 56.93%
coupon column.fig = px.histogram(df, x="coupon", color="coupon_accepted_str",
labels={'coupon':'Issued Coupons','coupon_accepted_str':'Coupon Accepted'},
title='Coupons Issued',
text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")
fig.show()
Observations :
temperature column.fig = px.histogram(df, x="temperature", color="coupon_accepted_str",
labels={'temperature':'Temperature','coupon_accepted_str':'Coupon Accepted'},
title='Temperature based acceptance',
category_orders={"temperature":[30,50,80]},
text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count",bargap=0.3)
fig.update_xaxes(type='category')
fig.show()
fig = px.histogram(df, x='occupation', title='Occupation Distribution',
labels={'occupation':'Occupation'}).update_xaxes(categoryorder='total ascending')
fig.update_layout(yaxis_title='Count')
fig.update_xaxes(tickangle = 45)
fig.show()
fig = px.histogram(df, x='destination',color='passenger', title='Destination Distribution',
labels={'destination':'Destination', 'passenger':'Passenger'}).update_xaxes(categoryorder='total ascending')
fig.update_layout(yaxis_title='Count')
fig.update_xaxes(tickangle = 45)
fig.show()
Most drivers are headed to non urgent place with friends. This can be an interesting category to analyze
fig = px.histogram(df, x='driving_time',color='coupon', title='Driving time for Issued coupons',
nbins=3,
labels={'driving_time':'Driving time in mins', 'coupon':'Issued Coupon'}).update_xaxes(categoryorder='total ascending')
fig.update_layout(yaxis_title='Count',bargap=0.3)
fig.update_xaxes(tickangle = 45)
fig.show()
Utility Method
Adding a method that can be reused in the analysis below. This method is used to compute the rate of acceptance based on the data frame and the grouping columns passed
def get_acceptance_pct(df, cols):
coupons_accepted = df.groupby(cols)[["coupon_accepted"]].sum()
coupons_issued = df.groupby(cols)[["coupon_accepted"]].count()
coupons_accepted_pct = round(coupons_accepted/coupons_issued*100,2)
coupons_accepted_pct['coupon_issued'] = coupons_issued['coupon_accepted']
coupons_accepted_pct.rename(columns={"coupon_accepted": "acceptance_pct"}, inplace=True) #rename to %
coupons_accepted_pct['coupon_accepted'] = coupons_accepted['coupon_accepted'] # this is the actual accepted count
return coupons_accepted_pct
Let us explore just the bar related coupons.
DataFrame that contains just the bar coupons.df_bar = df.query('coupon == "Bar"')
df_bar.head()
| destination | passenger | weather | temperature | time | coupon | expiration | gender | age | marital_status | ... | income | bar | coffee_house | takeaway | restaurant_lt_20 | restaurant_gt_20 | coupon_accepted | direction | coupon_accepted_str | driving_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 24 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 1 | 1 | Yes | 10 |
| 35 | Home | Alone | Sunny | 55 | 6PM | Bar | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 1 | 0 | Yes | 10 |
| 39 | Work | Alone | Sunny | 55 | 7AM | Bar | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 1 | 1 | Yes | 30 |
| 46 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Bar | 1d | Male | 46 | Single | ... | 12500 - 24999 | Never | 4~8 | 1~3 | 1~3 | Never | 0 | 1 | No | 10 |
| 57 | Home | Alone | Sunny | 55 | 6PM | Bar | 1d | Male | 46 | Single | ... | 12500 - 24999 | Never | 4~8 | 1~3 | 1~3 | Never | 0 | 0 | No | 10 |
5 rows × 23 columns
fig = px.histogram(df_bar['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True , title='Overall bar coupons acceptance')
fig.update_layout( yaxis_title="Percentage")
fig.show()
A total of 41.19% bar coupons were accepted by drivers
#Compute acceptance rate as a % value; We are using the method defined earlier to compute this
bar_coupons_accepted_pct = get_acceptance_pct(df_bar, ['bar'])
bar_coupons_accepted_pct
| acceptance_pct | coupon_issued | coupon_accepted | |
|---|---|---|---|
| bar | |||
| 1~3 | 64.64 | 379 | 245 |
| 4~8 | 77.55 | 147 | 114 |
| 8 or more | 71.74 | 46 | 33 |
| Never | 29.53 | 1341 | 396 |
#use a bar plot to visualize the acceptance based on bar visits
fig = px.histogram(df_bar, x=["bar"], color="coupon_accepted_str",
labels={'value':'Bar visits per month', 'coupon_accepted_str':'Coupon Accepted'},
title='Bar Coupons Acceptance based on bar visits').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")
#Show acceptance percentage value on top of each bar
fig.add_trace(go.Scatter(
x=bar_coupons_accepted_pct.index,
y=bar_coupons_accepted_pct['coupon_issued'],
text=bar_coupons_accepted_pct['acceptance_pct'],
mode='text',
textposition='top center',
textfont=dict(
size=18,
),
showlegend=False
))
fig.show()
This bar chart shows that drivers who never go to bars has a significantly less acceptance rate for the coupons issued for bar.
#Plot bar chart to compare acceptance based on age
df_bar_gt1 = df_bar.query('bar != "Never"')
fig = px.histogram(df_bar_gt1, x="age", color="coupon_accepted_str",
labels={'age':'Age of driver', 'coupon_accepted_str':'Coupon Accepted'},
title='Bar coupons acceptance based on driver age',
barmode='group',
text_auto=True).update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Count")
fig.show()
Driver over the age of 21 had a higher acceptance rate compared to younger drivers.
#filter the bar dataset for users visit bar atleast once a month and have an occupation other than farming
df_bar_visit_gt1 = df_bar.query("bar != 'Never'").query('occupation != "Farming Fishing & Forestry"')
#Group based on passenger alone
df_bar_visit_gt1_grp2 = get_acceptance_pct(df_bar_visit_gt1 ,['passenger','bar'] )
df_bar_visit_gt1_grp2
df_bar_visit_gt1_grp2
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| passenger | bar | |||
| Alone | 1~3 | 65.12 | 215 | 140 |
| 4~8 | 82.00 | 100 | 82 | |
| 8 or more | 73.08 | 26 | 19 | |
| Friend(s) | 1~3 | 82.67 | 75 | 62 |
| 4~8 | 72.00 | 25 | 18 | |
| 8 or more | 68.75 | 16 | 11 | |
| Kid(s) | 1~3 | 32.35 | 34 | 11 |
| 4~8 | 57.14 | 7 | 4 | |
| 8 or more | 100.00 | 1 | 1 | |
| Partner | 1~3 | 58.18 | 55 | 32 |
| 4~8 | 66.67 | 15 | 10 | |
| 8 or more | 66.67 | 3 | 2 |
fig = px.histogram(df_bar_visit_gt1_grp2.reset_index(), x='bar', y='acceptance_pct' , color="passenger",
labels={'bar':'Bar visits per month', 'passenger':'Passenger'},
barmode='group', width=800, height=600, text_auto=True,
title='Bar Coupons Acceptance Rate based on Passenger').update_xaxes(categoryorder='total ascending')
fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
In the bar chart above , we can ignore the green bar for 8 or more visit because there is a singular data point .
The acceptance rate is the least when drivers travel with kids.
df_bar_visit_gt2 = df_bar.query("bar != 'Never'").query('passenger != "Kid(s)"')
df_bar_visit_gt1_grp2 = get_acceptance_pct(df_bar_visit_gt2, ['occupation'])
fig = px.histogram(df_bar_visit_gt1_grp2.reset_index(), x='occupation', y='acceptance_pct' ,
labels={'occupation':'Occupation', 'passenger':'Passenger'},
barmode='group', width=1000, height=800, text_auto=True,
title='Bar Coupons Acceptance Rate based on Occupation').update_xaxes(categoryorder='total ascending')
fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
There are certain occupations where the acceptance is much higher like 'Architecture & Engineering or Healthcare Support'. Income might have a role to play in such cases
#create a new data set for customers who go to bars more than once a month,
#had passengers that were not a kid, and were not widowed
df2 = df_bar.query("bar!= 'Never'").query("passenger != 'Kid(s)'").query("marital_status != 'Widowed'")
df2_grp = df2.groupby(['passenger','marital_status'])[['coupon_accepted']].count()
df2_grp = df2_grp.rename(columns={'coupon_accepted':'coupon_issued'})
df2_grp['coupon_accepted'] = df2.groupby(['passenger','marital_status'])[['coupon_accepted']].sum()
df2_grp['acceptance_pct'] = round(df2_grp['coupon_accepted']/df2_grp['coupon_issued']*100,2)
df2_grp
| coupon_issued | coupon_accepted | acceptance_pct | ||
|---|---|---|---|---|
| passenger | marital_status | |||
| Alone | Divorced | 10 | 9 | 90.00 |
| Married partner | 77 | 58 | 75.32 | |
| Single | 189 | 137 | 72.49 | |
| Unmarried partner | 65 | 37 | 56.92 | |
| Friend(s) | Divorced | 3 | 2 | 66.67 |
| Married partner | 22 | 16 | 72.73 | |
| Single | 71 | 57 | 80.28 | |
| Unmarried partner | 20 | 16 | 80.00 | |
| Partner | Married partner | 29 | 19 | 65.52 |
| Single | 7 | 5 | 71.43 | |
| Unmarried partner | 37 | 20 | 54.05 |
fig = px.histogram(df2_grp.reset_index(), x='marital_status', y='acceptance_pct' , color="passenger",
labels={'marital_status':'Marital Status', 'passenger':'Passenger'},
barmode='group', width=800, height=600, text_auto=True,
title='Bar Coupons Acceptance Rate based on Occupation and Passenger').update_xaxes(categoryorder='total ascending')
fig.update_xaxes(tickangle=45)
fig.update_layout( yaxis_title="Acceptance %")
#create a new data set for customers who go to bars more than once a month and are undere the age of 30
df_age_lt30= df_bar.query('bar != "Never"').query('age < 30')
df_age_lt30_grp = df_age_lt30.groupby(['bar','age'])[['coupon_accepted']].count()
df_age_lt30_grp = df_age_lt30_grp.rename(columns={'coupon_accepted':'coupon_issued'})
df_age_lt30_grp['coupon_accepted'] = df_age_lt30.groupby(['bar','age'])[['coupon_accepted']].sum()
df_age_lt30_grp['acceptance_pct'] = round(df_age_lt30_grp['coupon_accepted']/df_age_lt30_grp['coupon_issued']*100,2)
#df2_grp = df2_grp.reset_index()
df_age_lt30_grp
| coupon_issued | coupon_accepted | acceptance_pct | ||
|---|---|---|---|---|
| bar | age | |||
| 1~3 | 16 | 6 | 2 | 33.33 |
| 21 | 103 | 67 | 65.05 | |
| 26 | 89 | 63 | 70.79 | |
| 4~8 | 21 | 44 | 33 | 75.00 |
| 26 | 52 | 44 | 84.62 | |
| 8 or more | 16 | 4 | 2 | 50.00 |
| 21 | 12 | 10 | 83.33 | |
| 26 | 18 | 15 | 83.33 |
#go to bars more than once a month and are under the age of 30 OR
fig = px.histogram(df_age_lt30_grp.reset_index(), x='age', y='acceptance_pct' , color="bar",
labels={'age':'Age', 'bar':'Bar visits per month'},
barmode='group', width=800, height=600, text_auto=True,
title='Bar Coupons Acceptance Rate based on age').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
#create a new data set for customers who go to cheap restaurants more than 4 times a month and income is less than 50K
def income_lt_50k(row):
income_range = row['income']
income_split = income_range.split('-',2)
max_income = int(income_split[1])
return max_income < 50_000
df3 = df_bar[df_bar.apply(income_lt_50k, axis=1)] #Filter dataset based on income
df_bar3 = df3.groupby(['restaurant_lt_20','income'])[['coupon_accepted']].count()
df_bar3 = df_bar3.rename(columns={'coupon_accepted':'coupon_issued'})
df_bar3['coupon_accepted'] = df_bar.groupby(['restaurant_lt_20','income'])[['coupon_accepted']].sum()
df_bar3['acceptance_pct'] = round(df_bar3['coupon_accepted']/df_bar3['coupon_issued']*100,2)
df_bar3
| coupon_issued | coupon_accepted | acceptance_pct | ||
|---|---|---|---|---|
| restaurant_lt_20 | income | |||
| 1~3 | 0 - 12500 | 53 | 30 | 56.60 |
| 12500 - 24999 | 123 | 53 | 43.09 | |
| 25000 - 37499 | 141 | 61 | 43.26 | |
| 37500 - 49999 | 131 | 37 | 28.24 | |
| 4~8 | 0 - 12500 | 58 | 20 | 34.48 |
| 12500 - 24999 | 72 | 30 | 41.67 | |
| 25000 - 37499 | 57 | 25 | 43.86 | |
| 37500 - 49999 | 50 | 20 | 40.00 | |
| 8 or more | 0 - 12500 | 15 | 12 | 80.00 |
| 12500 - 24999 | 24 | 11 | 45.83 | |
| 25000 - 37499 | 17 | 8 | 47.06 | |
| 37500 - 49999 | 40 | 26 | 65.00 | |
| Never | 0 - 12500 | 33 | 11 | 33.33 |
| 12500 - 24999 | 52 | 20 | 38.46 | |
| 25000 - 37499 | 85 | 42 | 49.41 | |
| 37500 - 49999 | 26 | 7 | 26.92 |
#go to bars more than once a month and are under the age of 30 OR
fig = px.histogram(df_bar3.reset_index(), x='income', y='acceptance_pct' , color="restaurant_lt_20",
labels={'income':'Income', 'restaurant_lt_20':'Visits Restaurant < 20 '},
barmode='group', width=800, height=600, text_auto=True,
title='Bar Coupons Acceptance Rate based on Income').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
Coffee house coupons have an overall acceptance of <50%. Lets analyze who are most likely to accept coffee house coupon
df_coffee = df.query('coupon=="Coffee House"')
df_coffee
| destination | passenger | weather | temperature | time | coupon | expiration | gender | age | marital_status | ... | income | bar | coffee_house | takeaway | restaurant_lt_20 | restaurant_gt_20 | coupon_accepted | direction | coupon_accepted_str | driving_time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 23 | No Urgent Place | Friend(s) | Sunny | 80 | 10AM | Coffee House | 2h | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 10 |
| 26 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 10 |
| 27 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 2h | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 20 |
| 28 | No Urgent Place | Friend(s) | Sunny | 80 | 2PM | Coffee House | 1d | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 20 |
| 30 | No Urgent Place | Friend(s) | Sunny | 80 | 6PM | Coffee House | 2h | Male | 21 | Single | ... | 62500 - 74999 | Never | Never | 4~8 | 4~8 | Never | 0 | 1 | No | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 12656 | Home | Alone | Snowy | 30 | 10PM | Coffee House | 2h | Male | 31 | Married partner | ... | 100000 - 200000 | Never | Never | 4~8 | 8 or more | Never | 0 | 1 | No | 20 |
| 12659 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 31 | Married partner | ... | 100000 - 200000 | Never | Never | 4~8 | 8 or more | Never | 0 | 0 | No | 10 |
| 12674 | Home | Alone | Rainy | 55 | 10PM | Coffee House | 2h | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 0 | No | 10 |
| 12675 | Home | Alone | Snowy | 30 | 10PM | Coffee House | 2h | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 1 | No | 20 |
| 12681 | Work | Alone | Snowy | 30 | 7AM | Coffee House | 1d | Male | 26 | Single | ... | 75000 - 87499 | Never | Never | 1~3 | 4~8 | 1~3 | 0 | 0 | No | 10 |
3816 rows × 23 columns
fig = px.histogram(df_coffee['coupon_accepted_str'], x="coupon_accepted_str", histnorm='percent',
labels={'coupon_accepted_str':'Coupon Accepted'}, text_auto=True , title='Coffee House Coupons acceptance Rate')
fig.update_layout( yaxis_title="Percentage")
fig.show()
A total of 49.6% of all issued coffee house coupons were accepted
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['coffee_house'])
coffee_coupons_accepted_pct
#use a bar plot to visualize the acceptance rate based on coffee house visit freq
fig = px.histogram(df_coffee, x="coffee_house", color="coupon_accepted_str",
labels={'coffee_house':'Coffee House visits per month', 'coupon_accepted_str':'Coupons Accepted'},
title='Coffee House Coupons Acceptance based on coffee house visit freq').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Coupons Count")
#Show acceptance percentage value on top of each bar
fig.add_trace(go.Scatter(
x=coffee_coupons_accepted_pct.index,
y=coffee_coupons_accepted_pct['coupon_issued'],
text=coffee_coupons_accepted_pct['acceptance_pct'],
mode='text',
textposition='top center',
textfont=dict(
size=18,
),
showlegend=False
))
fig.show()
Acceptance rate of coupon is close to 65% when driver visits coffee house >=1 time a month
This suggests that drivers who visit coffee house more than once a month are more likely to accept the coffee house coupon compared to those who never visit coffee house
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['temperature','weather'])
coffee_coupons_accepted_pct
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| temperature | weather | |||
| 30 | Snowy | 42.81 | 285 | 122 |
| Sunny | 71.43 | 14 | 10 | |
| 55 | Rainy | 51.63 | 215 | 111 |
| Sunny | 43.82 | 1004 | 440 | |
| 80 | Sunny | 52.70 | 2298 | 1211 |
Acceptance rate varies between 44 - 52% for all the temperature. Sunny weather at 30F seems to have a higher acceptance rate , but the total coupons in this case is quite less and can not be considered as one of the factors. So temperature or weather doesnt seem to play a big role in coupon acceptance
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['age','gender','education'])
#print(coffee_coupons_accepted_pct)
#Plot bar charts based on gender to show the comparison
fig = px.histogram(coffee_coupons_accepted_pct.reset_index(), x="age", y='acceptance_pct',color='education',
facet_col="gender",nbins=8,
labels={'age':'Age', 'gender':'Gender' , 'education':'Education'},
barmode='group', text_auto=True,width = 1200,
category_orders={"education":["High School Graduate","Bachelors degree"]},
title='Coffee House coupon acceptance based on gender, age and education').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %", bargap=0.3)
fig.show()
There seems to be an almost equal divide for gender. Age wise distribution shows that Male aged 16 or younger are more likely to accept a coupon for coffee house compared to female of the same age
During univariate analysis, we saw that most of the drivers are headed to non urgent place with friends. Can this be a factor influencing the acceptance rate?
#Compute acceptance rate as a % value
coffee_coupons_accepted_pct = get_acceptance_pct(df_coffee, ['destination','passenger'])
coffee_coupons_accepted_pct
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| destination | passenger | |||
| Home | Alone | 35.05 | 833 | 292 |
| Partner | 54.55 | 55 | 30 | |
| No Urgent Place | Alone | 57.68 | 449 | 259 |
| Friend(s) | 59.74 | 1175 | 702 | |
| Kid(s) | 47.15 | 193 | 91 | |
| Partner | 57.20 | 236 | 135 | |
| Work | Alone | 44.00 | 875 | 385 |
fig = px.histogram(coffee_coupons_accepted_pct.reset_index(), x='passenger', y='acceptance_pct' , color="destination",
labels={'destination':'Destination', 'passenger':'Passenger'},
barmode='group', width=800, height=600, text_auto=True,
title='Coffee House Coupons Acceptance Rate based on destination and passenger').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
The data set does not have enough data points to be able to compare the acceptance rate based on these two features. The acceptance rate is almost the same for the data available. The plot does tell us however that a driver is more likley to accept a coupon driving to non urgent place than they are driving home alone.
#compute acceptance rate based on direction of travel from destination
coffee_coupons_accepted_pct_dir = get_acceptance_pct(df_coffee, ['direction','destination','coffee_house'])
coffee_coupons_accepted_pct_dir
| acceptance_pct | coupon_issued | coupon_accepted | |||
|---|---|---|---|---|---|
| direction | destination | coffee_house | |||
| 0 | Home | 1~3 | 63.51 | 74 | 47 |
| 4~8 | 78.26 | 46 | 36 | ||
| 8 or more | 56.67 | 30 | 17 | ||
| Never | 29.01 | 162 | 47 | ||
| Work | 1~3 | 74.47 | 94 | 70 | |
| 4~8 | 81.48 | 54 | 44 | ||
| 8 or more | 80.95 | 42 | 34 | ||
| Never | 38.32 | 214 | 82 | ||
| 1 | Home | 1~3 | 44.81 | 154 | 69 |
| 4~8 | 52.63 | 76 | 40 | ||
| 8 or more | 46.30 | 54 | 25 | ||
| Never | 14.04 | 292 | 41 | ||
| No Urgent Place | 1~3 | 74.87 | 561 | 420 | |
| 4~8 | 72.43 | 272 | 197 | ||
| 8 or more | 73.56 | 174 | 128 | ||
| Never | 42.26 | 1046 | 442 | ||
| Work | 1~3 | 39.71 | 136 | 54 | |
| 4~8 | 49.15 | 59 | 29 | ||
| 8 or more | 50.00 | 42 | 21 | ||
| Never | 21.79 | 234 | 51 |
#use a bar plot to visualize the acceptance rate based on coffee house visit freq
#lets ignore the drivers who never visit coffee house since the acceptance rate is lesser there
fig = px.histogram(coffee_coupons_accepted_pct_dir.reset_index().query('coffee_house!="Never"'), x="destination", y='acceptance_pct',color="direction",
facet_col="coffee_house",
labels={'destination':'Destination', 'direction':'Direction from destination',
'gender':'Gender','coffee_house':'Coffee House visits' },
barmode='group', text_auto=True,
title='Coffee House Coupons Acceptance based on travel direction').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
Here, direction = 0 represents going in the same direction whereas 1 represents opposite direction The bar shows that when travelling to home or work, the acceptance rate is higher as compared to when driving away from home or work.
The drivers are always returning to a non urgent place and have a high acceptance rate for that
#compute acceptance rate based on coupon attribute
coffee_coupons_accepted_pct_tte = get_acceptance_pct(df_coffee, ['expiration','has_children','coffee_house'])
coffee_coupons_accepted_pct_tte
| acceptance_pct | coupon_issued | coupon_accepted | |||
|---|---|---|---|---|---|
| expiration | has_children | coffee_house | |||
| 1d | 0 | 1~3 | 72.76 | 257 | 187 |
| 4~8 | 82.09 | 134 | 110 | ||
| 8 or more | 85.26 | 95 | 81 | ||
| Never | 40.86 | 558 | 228 | ||
| 1 | 1~3 | 75.45 | 167 | 126 | |
| 4~8 | 76.85 | 108 | 83 | ||
| 8 or more | 76.79 | 56 | 43 | ||
| Never | 39.31 | 318 | 125 | ||
| 2h | 0 | 1~3 | 59.58 | 381 | 227 |
| 4~8 | 59.41 | 170 | 101 | ||
| 8 or more | 55.10 | 98 | 54 | ||
| Never | 29.39 | 667 | 196 | ||
| 1 | 1~3 | 56.07 | 214 | 120 | |
| 4~8 | 54.74 | 95 | 52 | ||
| 8 or more | 50.54 | 93 | 47 | ||
| Never | 28.15 | 405 | 114 |
fig = px.histogram(coffee_coupons_accepted_pct_tte.reset_index().query('coffee_house!="Never"'), x="expiration", y='acceptance_pct',color="has_children",
facet_col="coffee_house",
labels={'expiration':'Expiration', 'has_children':'Children',
'gender':'Gender','coffee_house':'Coffee House visits' },
barmode='group', text_auto=True,
title='Coffee House Coupons Acceptance based on coupon expiry duration').update_xaxes(categoryorder='total ascending')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
Coupons with longer expiration (1day) have a higher acceptance rate compared to the ones expiring sooner.
Number of children doesnt make a big difference in the acceptance of coffee house coupon
#compute acceptance rate based on income
coffee_coupons_accepted_pct_income = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['income','time'])
coffee_coupons_accepted_pct_income
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| income | time | |||
| 0 - 12500 | 10AM | 69.44 | 36 | 25 |
| 10PM | 56.52 | 23 | 13 | |
| 2PM | 92.00 | 25 | 23 | |
| 6PM | 60.98 | 41 | 25 | |
| 7AM | 74.36 | 39 | 29 | |
| 100000 - 200000 | 10AM | 85.45 | 55 | 47 |
| 10PM | 41.18 | 17 | 7 | |
| 2PM | 67.35 | 49 | 33 | |
| 6PM | 56.58 | 76 | 43 | |
| 7AM | 49.18 | 61 | 30 | |
| 12500 - 24999 | 10AM | 86.44 | 59 | 51 |
| 10PM | 58.33 | 12 | 7 | |
| 2PM | 73.91 | 69 | 51 | |
| 6PM | 65.48 | 84 | 55 | |
| 7AM | 55.07 | 69 | 38 | |
| 25000 - 37499 | 10AM | 78.33 | 60 | 47 |
| 10PM | 45.45 | 22 | 10 | |
| 2PM | 77.61 | 67 | 52 | |
| 6PM | 63.89 | 72 | 46 | |
| 7AM | 59.72 | 72 | 43 | |
| 37500 - 49999 | 10AM | 87.93 | 58 | 51 |
| 10PM | 47.62 | 21 | 10 | |
| 2PM | 78.26 | 46 | 36 | |
| 6PM | 60.00 | 75 | 45 | |
| 7AM | 75.93 | 54 | 41 | |
| 50000 - 62499 | 10AM | 79.66 | 59 | 47 |
| 10PM | 52.94 | 17 | 9 | |
| 2PM | 57.14 | 56 | 32 | |
| 6PM | 53.03 | 66 | 35 | |
| 7AM | 52.63 | 57 | 30 | |
| 62500 - 74999 | 10AM | 65.52 | 29 | 19 |
| 10PM | 16.67 | 6 | 1 | |
| 2PM | 75.86 | 29 | 22 | |
| 6PM | 50.00 | 42 | 21 | |
| 7AM | 60.00 | 35 | 21 | |
| 75000 - 87499 | 10AM | 100.00 | 10 | 10 |
| 10PM | 44.44 | 9 | 4 | |
| 2PM | 63.64 | 11 | 7 | |
| 6PM | 54.55 | 22 | 12 | |
| 7AM | 35.71 | 14 | 5 | |
| 87500 - 99999 | 10AM | 80.56 | 36 | 29 |
| 10PM | 60.00 | 10 | 6 | |
| 2PM | 77.42 | 31 | 24 | |
| 6PM | 58.54 | 41 | 24 | |
| 7AM | 57.69 | 26 | 15 |
fig = px.histogram(coffee_coupons_accepted_pct_income.reset_index(), x="income", y='acceptance_pct',color='time',
labels={'income':'Income', 'time':'Time'},
barmode='group', text_auto=True,
title='Coffee House Coupons Acceptance based on driver income and time of the day',
category_orders={"income": ["0 - 12500", "12500 - 24999", '25000 - 37499','37500 - 49999','50000 - 62499', '62500 - 74999','75000 - 87499', '87500 - 99999'],
"time":["7AM","10AM","2PM","6PM","10PM"]})
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
Irrespective of income, the acceptance rate seems similar across the income range Drivers driving around 10AM has the most acceptance rate comparatively
coffee_coupons_accepted_pct_bar = get_acceptance_pct(df_coffee, ['bar' ] )
coffee_coupons_accepted_pct_bar
coffee_coupons_accepted_pct_tkaway = get_acceptance_pct(df_coffee, ['takeaway' ] )
coffee_coupons_accepted_pct_tkaway
coffee_coupons_accepted_pct_rstlt20 = get_acceptance_pct(df_coffee, ['restaurant_lt_20' ] )
coffee_coupons_accepted_pct_rstlt20
coffee_coupons_accepted_pct_rstgt20 = get_acceptance_pct(df_coffee, ['restaurant_gt_20' ] )
coffee_coupons_accepted_pct_rstgt20
| acceptance_pct | coupon_issued | coupon_accepted | |
|---|---|---|---|
| restaurant_gt_20 | |||
| 1~3 | 51.63 | 1042 | 538 |
| 4~8 | 56.83 | 183 | 104 |
| 8 or more | 63.74 | 91 | 58 |
| Never | 47.76 | 2500 | 1194 |
fig = go.Figure()
fig.add_trace(go.Bar(
x=coffee_coupons_accepted_pct_bar.index,
y=coffee_coupons_accepted_pct_bar['acceptance_pct'],
name='Bar Visits',
marker_color='indianred'
))
fig.add_trace(go.Bar(
x=coffee_coupons_accepted_pct_tkaway.index,
y=coffee_coupons_accepted_pct_tkaway['acceptance_pct'],
name='Takeaway frequency',
marker_color='lightsalmon'
))
fig.add_trace(go.Bar(
x=coffee_coupons_accepted_pct_rstlt20.index,
y=coffee_coupons_accepted_pct_rstlt20['acceptance_pct'],
name='Restaurant < 20',
marker_color='crimson'
))
fig.add_trace(go.Bar(
x=coffee_coupons_accepted_pct_rstgt20.index,
y=coffee_coupons_accepted_pct_rstgt20['acceptance_pct'],
name='Restaurant > 20',
marker_color='pink'
))
# Here we modify the tickangle of the xaxis, resulting in rotated labels.
fig.update_layout(barmode='group',xaxis_title='Visit frequency per month', yaxis_title="Acceptance %",
title='Coffee house coupon acceptance rate based on driver visit to other establishments')
fig.show()
It looks like the acceptance rate of coffee house coupon does not vary much based on how many times drivers visit other establishments
coffee_coupons_accepted_pct_marital = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['marital_status','coffee_house' ] )
coffee_coupons_accepted_pct_marital
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| marital_status | coffee_house | |||
| Divorced | 1~3 | 60.00 | 20 | 12 |
| 4~8 | 62.50 | 32 | 20 | |
| 8 or more | 57.14 | 28 | 16 | |
| Married partner | 1~3 | 63.36 | 393 | 249 |
| 4~8 | 72.46 | 167 | 121 | |
| 8 or more | 67.20 | 125 | 84 | |
| Single | 1~3 | 68.59 | 417 | 286 |
| 4~8 | 65.49 | 226 | 148 | |
| 8 or more | 68.70 | 131 | 90 | |
| Unmarried partner | 1~3 | 59.79 | 189 | 113 |
| 4~8 | 69.51 | 82 | 57 | |
| 8 or more | 60.34 | 58 | 35 |
fig = px.histogram(coffee_coupons_accepted_pct_marital.reset_index(), x="coffee_house", y='acceptance_pct',
color='marital_status',
labels={'income':'Income', 'marital_status':'Marital Status','coffee_house':'Coffee house visits per month'},
barmode='group', text_auto=True,
title='Coffee House Coupons Acceptance based on driver income')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
No difference based on marital status
coffee_coupons_accepted_pct_time = get_acceptance_pct(df_coffee.query('coffee_house!="Never"'), ['driving_time','coffee_house' ] )
coffee_coupons_accepted_pct_time
| acceptance_pct | coupon_issued | coupon_accepted | ||
|---|---|---|---|---|
| driving_time | coffee_house | |||
| 10 | 1~3 | 73.92 | 487 | 360 |
| 4~8 | 77.42 | 248 | 192 | |
| 8 or more | 67.68 | 164 | 111 | |
| 20 | 1~3 | 59.38 | 448 | 266 |
| 4~8 | 60.63 | 221 | 134 | |
| 8 or more | 65.56 | 151 | 99 | |
| 30 | 1~3 | 40.48 | 84 | 34 |
| 4~8 | 52.63 | 38 | 20 | |
| 8 or more | 55.56 | 27 | 15 |
fig = px.histogram(coffee_coupons_accepted_pct_time.reset_index(), x="coffee_house", y='acceptance_pct',
color='driving_time',
labels={'driving_time':'Driving Time to Coupon location', 'coffee_house':'Coffee house visits per month'},
barmode='group', text_auto=True,
title='Coffee House Coupons Acceptance based on driving time to coupon location')
fig.update_layout( yaxis_title="Acceptance %")
fig.show()
Easy to conclude that the lesser the driving time, the higher the acceptance rate of the coupon.
The drivers who accept coupons visit the establishment atleast once a month. We saw this for both bar as well as coffee house. This is irrrespective of age, gender or income.
Drivers are more likely to accept a coupon when the weather is warmer (80F ) as compared to colder temperatures
Drivers driving at 10 AM are most likely to accept a coupon
A male driver is most likely to accept coffee house coupon if they are 16 or younger. They are more likely to accept bar coupon if they are 21 or older
When a coupon location is shorter drive away, the likelyhood of it getting accepted is higher
Coupons with longer expiration (1day) have a higher acceptance rate compared to the ones expiring sooner.
A driver driving to a non urgent place is more likely to accept a coupon as compared to when driving to home or work
Analyze the coupons for restaurants and takeaways as well in a similar fashion and verify if the findings from coffee house and bar also applies to these categories.